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Abstract 

We  address  the  problem  of  visual  recognition  from  mul¬ 
tiple  observations  of  the  same  physical  object,  which  can 
be  generated  under  different  conditions,  such  as  frames  at 
different  time  instances  or  snapshots  from  different  view¬ 
points.  We  formulate  the  multi- observation  visual  recogni¬ 
tion  task  as  a  joint  sparse  representation  model  and  take  ad¬ 
vantage  of  the  correlations  among  the  multiple  observations 
for  classification  using  a  novel  joint  dynamic  sparsity  prior. 
The  proposed  joint  dynamic  sparsity  prior  promotes  shared 
joint  sparsity  pattern  among  the  multiple  sparse  represen¬ 
tation  vectors  at  class-level,  while  allowing  distinct  sparsity 
patterns  at  atom-level  within  each  class  in  order  to  facilitate 
a  flexible  representation.  The  proposed  method  can  handle 
both  homogenous  as  well  as  heterogenous  data  within  the 
same  framework.  Extensive  experiments  on  various  visual 
classification  tasks  including  face  recognition  and  generic 
object  classification  demonstrate  that  the  proposed  method 
outperforms  existing  state-of-the-art  methods. 


1.  Introduction 

Recent  dramatic  increase  in  different  kinds  of  visual  data 
has  created  a  surge  in  demand  for  effective  processing  and 
analysis  algorithms.  For  instance,  a  video  camera  can  gen¬ 
erate  multiple  observations  of  the  same  object  at  different 
time  instances;  a  camera  network  can  capture  the  same  sub¬ 
ject  from  different  viewpoints;  systems  with  heterogenous 
sensors  ( e.g .,  visible  light  cameras,  inferred  cameras  and 
laser  range  finders)  can  generate  heterogenous  visual  data 
for  the  same  physical  object.  All  these  scenarios  pose  great 
challenges  to  the  existing  data  processing  techniques  and 
require  new  schemes  for  effective  data  processing.  In  par¬ 
ticular,  object  recognition  and  classification  from  multiple 
observations  are  interesting  and  are  of  great  use  for  nu¬ 
merous  applications  (e.g.,  surveillance,  law  enforcement). 
However,  most  existing  techniques  are  designed  for  single 
observation  based  classification,  which  are  clearly  not  opti¬ 
mal  due  to  the  failure  of  exploiting  the  correlations  among 
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Figure  1.  Illustration  of  the  joint  dynamic  sparse  representation 
based  multi-observation  recognition  framework.  The  observation 
ensemble  contains  multiple  observations  generated  under  different 
conditions  such  as  viewpoints.  Each  observation  can  be  sparsely 
represented  by  potentially  different  training  images  from  the  same 
class,  thus  the  sparse  representation  vectors  share  the  same  sparsity 
pattern  at  class  level  but  distinct  at  atom  level.  Classification  is 
achieved  via  the  total  reconstruction  error  of  all  the  observations. 


the  multiple  observations  of  the  same  physical  object. 

In  this  paper,  we  propose  a  novel  Joint  Dynamic  Sparse 
Representation  based  Classification  method  (JDSRC)  for 
multi-observation  based  visual  recognition.  The  problem 
of  recovering  the  sparse  linear  representation  of  a  single 
query  datum  with  respect  to  a  set  of  reference  datum  (dic¬ 
tionary)  has  received  wide  interest  recently  in  image  pro¬ 
cessing,  computer  vision  and  pattern  recognition  communi¬ 
ties  [3,  10].  Recently,  extensions  on  recovering  the  sparse 
representations  of  multiple  query  data  samples  jointly  have 
been  investigated  and  applied  to  multi-task  visual  recogni¬ 
tion  problem  in  [12],  where  the  multiple  tasks  (features)  are 
assumed  to  have  the  same  sparsity  pattern  in  their  sparse 
representation  vectors.  The  proposed  JDSRC  method  ex¬ 
ploits  the  correlations  among  the  multiple  observations  us¬ 
ing  a  novel  joint  dynamic  sparsity  prior  to  improve  the  per¬ 
formance  of  a  recognition  system,  with  the  assumption  that 
the  sparse  representation  vectors  of  multiple  observations 
have  the  same  sparsity  pattern  at  class  level,  but  not  nec¬ 
essarily  at  atom  level,  thus  the  proposed  algorithm  can  not 
only  exploit  the  correlations  among  the  observations  but  is 
also  more  flexible  than  the  same  atom-level  sparisty  pattern 
assumption.  Moreover,  the  JDSRC  method  is  very  gen- 
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eral,  and  can  handle  both  homogenous  and  heterogenous 
data  within  the  same  framework.  Taking  multi- view  object 
recognition  as  an  example,  Figure  1  depicts  and  motivates 
our  JDSRC  method.  Given  a  set  of  test  observations  from 
different  viewpoints  for  a  given  object  “cat”,  we  first  per¬ 
form  joint  dynamic  sparse  representation  of  this  observa¬ 
tion  ensemble  with  respect  to  a  dictionary  of  training  im¬ 
ages  and  then  classify  the  observation  ensemble  to  the  class 
which  gives  the  minimum  total  reconstruction  error.  As  the 
multiple  observations  describing  the  same  physical  object 
“cat”,  the  recovered  sparse  representation  vectors  tend  to 
have  the  same  sparsity  pattern  at  class-level,  ideally  with 
non-zeros  coefficients  only  associated  with  images  of  “cat” 
in  the  dictionary;  on  the  other  hand,  since  the  multiple  ob¬ 
servations  are  captured  from  different  viewpoints,  the  atom- 
level  sparsity  patterns  of  the  representation  vectors  are  not 
necessarily  the  same,  tending  to  have  non-zero  coefficients 
associated  with  training  images  of  similar  viewpoints,  as 
depicted  in  Figure  1 .  We  term  the  property  that  multiple 
sparse  representation  vectors  with  shared  sparsity  pattern  at 
class-level  but  not  necessarily  at  atom-level  as  joint  dynamic 
sparsity.  Using  this  property,  the  proposed  JDSRC  method 
can  achieve  several  significant  goals:  (1)  it  combines  the 
information  from  each  observation  for  discrimination  dur¬ 
ing  the  joint  sparse  recovery  process  rather  than  in  post¬ 
processing,  thus  can  potentially  avoid  the  risk  of  making 
erroneous  decision  for  each  observation  when  treated  inde¬ 
pendently;  (2)  it  exploits  the  correlations  among  all  the  ob¬ 
servations  and  can  handle  both  homogenous  and  heteroge¬ 
nous  tasks;  (3)  the  joint  dynamic  sparsity  model  adopted  in 
JDSRC  enables  more  flexible  and  adaptive  atom  selection 
for  joint  sparse  representation,  thus  is  more  powerful.  The 
rest  of  this  paper  is  organized  as  follows.  In  Section  2,  we 
review  some  related  works  briefly.  We  introduce  the  JDSRC 
model  and  present  an  efficient  algorithm  for  solving  it  in 
Section  3.  Experiments  are  carried  out  on  various  datasets 
in  Section  4.  We  conclude  this  paper  in  Section  5. 

2.  Related  Works 

We  will  first  review  the  sparse  representation  based 
method  for  single  observation  based  classification  and  then 
discuss  its  recent  extension  to  multiple  observations. 

2.1.  Single  Observation  based  Classification  via 
Sparse  Representation 

Recently,  a  Sparse  Representation  based  Classification 
(SRC)  method  for  single  image  based  face  recognition  is 
developed  in  [1 1].  This  method  casts  the  task  of  face  recog¬ 
nition  as  one  of  classifying  between  linear  regression  mod¬ 
els  via  sparse  representation.  It  is  based  on  the  simple  as¬ 
sumption  that  a  new  test  sample  y  from  the  c-th  class  lies 
in  the  same  subspace  as  the  training  samples  (atoms)  of  the 
same  class  Ac  =  [aCji,  aCj2,  •  ■  ■  thus  can  be  well  repre¬ 


sented  by  a  linear  combination  of  the  training  samples  from 
Ac: 

y  =  xc?iac?i  +  xc?2aC52  H - =  A cxc.  (1) 

As  the  class  label  of  the  test  image  y  is  unknown,  we  can 
recover  the  representation  vector  x  for  y  with  respect  to  the 
whole  training  set  A,  which  should  be  sparse  by  assump¬ 
tion,  thus  naturally  leading  to  a  sparse  representation  prob¬ 
lem  over  A  [11]: 

x  =  argmin  ||x||o 

x  9  (2) 
s.t.  ||y  —  Ax|||  <  e, 

where  e  is  the  reconstruction  error  parameter,  A  = 
[Ai,A2,---  , Ac]  G  RdxN  is  the  dictionary  collect¬ 
ing  training  samples  from  all  C  classes  and  x  = 
[x^,  xj,  •  •  •  ,  xJ]T  G  Rn  is  the  representation  vector  in 
terms  of  A.  N  =  J2c=i  Ac  is  the  total  number  of  training 
samples.  After  recovering  x,  the  class  label  for  y  is  deter¬ 
mined  based  on  the  minimum  reconstruction  error  criteria 
by  projecting  the  test  sample  onto  each  class  as: 

c=  argmin  ||y  —  A5c(x)|||,  (3) 

C 

where  5C(-)  is  a  vector  operator  keeping  the  elements  corre¬ 
sponding  to  the  c-th  class  while  setting  all  others  as  zero. 

2.2.  Classification  via  Joint  Sparse  Representation 
for  Multiple  Observations 

In  presence  of  multiple  observations,  applying  SRC  for 
each  observation  separately  is  clearly  sub-optimal  due  to 
the  failure  of  exploiting  the  correlations  among  the  mul¬ 
tiple  observations.  It  should  be  more  robust  to  perform 
sparse  representation  simultaneously  for  all  the  observa¬ 
tions,  while  combining  the  information  from  all  of  them 
during  sparse  recovery  by  applying  joint  constraints  to  their 
sparse  representation  vectors.  Recently,  several  extensions 
have  been  made  to  generalize  SRC  to  handle  multiple  ob¬ 
servations.  Denoting  the  dictionary  associated  with  the 
k- th  observation  yk  as  Ak  (also  referred  to  as  observation¬ 
dictionary),  k  G  {1,2  ,  •••  ,AT},  [12]  proposed  a  Multi- 
Task  Joint  Sparse  Representation  Classification  (MTJSRC) 
method  for  multiple  feature  based  classification: 

X  =  argmin  \  £  ||yfe  -  Acfcxcfe|||  +  A  £  ||xc||2 

fc=l  c=l  c=l 

=  argmin  \  £  ||y*  -  Afcxfc||!  +  A  £  ||xc||2, 

k= 1  c—  1 

where  xc  =  [xJT,  •  •  •  ,  x^T]T  G  RKNc  is  the  collection  of 
the  representation  vectors  associated  with  class  c  across  all 
the  K  observations/features.  Ak  =  [Ak,  Ak,  •  •  •  ,  A^]  G 
Rdk  x  N  denotes  the  dictionary  for  the  k- th  observation  and 
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x.k  =  [x^T,  X2T,  •  •  •  ,  x^T ]T  G  Rn  is  the  associated  rep¬ 
resentation  vector.  X  =  [x^x2,---  ,xK]  G  RNxK  is 
the  collection  of  the  multiple  sparse  representation  vectors. 
Using  this  model,  the  recovered  sparse  representation  vec¬ 
tors  will  have  the  same  sparsity  pattern,  not  only  at  class- 
level,  but  at  atom-level  as  well.  The  classification  decision 
is  made  as  the  class  which  gives  the  lowest  reconstruction 
error  accumulated  over  all  the  K  observations: 

K 

c  =  argminy>fe||yfc-Afc<5c(xfc)|||,  (5) 

C  Z J 

k= 1 

where  wk  is  the  weight  reflecting  the  confidence  in  the  k- th 
observation  which  can  be  learned  from  data. 

3.  Joint  Dynamic  Sparse  Representation  based 
Multi-observation  Recognition 

3.1.  Problem  Formulation 

The  MTJSRC  method  [12]  generalizes  SRC  to  multiple 
observations  by  assuming  all  the  observations  will  share  the 
same  set  of  selected  atoms  for  sparse  representation  (Fig¬ 
ure  2  (b)),  which  is  reasonable  in  the  case  of  multiple  fea¬ 
tures  from  the  same  datum.  However,  in  more  general  mul¬ 
tiple  observations  cases,  due  to  the  variation  of  observation 
conditions,  e.g .,  viewpoints,  each  observation  may  be  better 
represented  by  a  different  set  of  atoms  from  the  same  class, 
as  illustrated  in  Figure  1,  thus  assuming  the  observations 
from  different  viewpoints  can  be  represented  by  the  same 
set  of  training  samples  is  inappropriate.  Rather,  the  desired 
sparse  representation  vectors  for  the  multiple  observations 
should  share  the  same  class-level  sparsity  pattern  while  their 
atom-level  sparsity  patterns  may  be  distinct-/,  e.,  following 
joint  dynamic  sparsity,  as  shown  in  Figure  2  (c).  One  of  the 
key  ingredients  in  our  JDSRC  model  for  promoting  joint 
dynamic  sparsity  is  the  dynamic  active  set.  A  dynamic  ac¬ 
tive  set  gs  G  Rk  refers  to  the  indices  of  a  set  of  coefficients 
corresponding  to  the  same  class  in  the  coefficient  matrix  X, 
which  are  activated  jointly  during  sparse  representation  of 
multiple  observations.  Each  dynamic  active  set  gs  contains 
one  and  only  one  index  for  each  column  of  X,  where  gs(k) 
is  for  the  k- th  column  of  X,  as  shown  in  Figure  2  (c). 

We  formulate  our  JDSRC  model  as  a  multivariate  regres¬ 
sion  problem  with  a  novel  joint  dynamic  sparsity  promoting 
term,  which  is  derived  in  the  sequel.  The  following  proper¬ 
ties  are  desired  in  designing  such  a  term:  (i)  cues  from  mul¬ 
tiple  observations  should  be  combined  during  joint  sparse 
representation,  thus  enhancing  the  robustness  of  joint  sparse 
recovery;  (ii)  sparsity  across  dynamic  active  sets  should  be 
promoted,  thus  inducing  joint  dynamic  sparsity  pattern  over 
the  recovered  multiple  sparse  representation  vectors.  To 
combine  the  strength  of  all  the  atoms  within  a  dynamic  ac¬ 
tive  set  (thus  across  all  the  observations),  we  apply  ^-norm 
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Figure  2.  Illustration  of  different  sparsity  models  for  coefficient 
matrix  X.  Each  column  denotes  a  representation  vector  and  each 
squared  block  denotes  a  representation  coefficient  value  in  the  cor¬ 
responding  representation  vector.  A  white  block  denotes  zero  en¬ 
try  value.  Colored  blocks  denote  non-zeros  values,  (a)  separate 
sparse  representation:  the  sparse  representation  vectors  may  be 
quite  different  due  to  the  separate  recovery  process,  (b)  joint  spar¬ 
sity:  sparse  coefficient  vectors  share  the  same  patterns  (selecting 
the  same  atoms),  but  with  different  coefficient  values,  (c)  joint  dy¬ 
namic  sparsity:  the  sparse  coefficient  vectors  select  different  atoms 
within  each  class-dictionary  to  represent  each  of  the  observations. 


over  each  dynamic  active  set;  to  promote  sparsity,  i.e.,  to 
allow  a  small  number  of  dynamic  active  sets  to  be  involved 
in  joint  sparse  representation,  we  apply  ^o-norm  across  the 
^2-norm  of  the  dynamic  active  sets.  Therefore,  we  arrive  at 
the  following  joint  dynamic  sparsity  promoting  term: 


IIxIIg  = 

llxSi  II2,  ||xg2  ||2,  •  -  - 

5 

0 

(6) 

where  xgs  denotes  the  vector  formed 

as  the 

col- 

lection  of  the 

coefficients  associated 

with 

the 

s- th  dynamic  active  set  gs :  xgs 

= 

X(g8) 

= 

[X(g8(l),l),X(g8(2),2),...,X(g  S(K),K)]T  e  R*. 


To  recover  the  sparse  representation  coefficient  matrix 
X  with  joint  dynamic  sparse  property  for  the  multiple 
observations  {yk}%=1,  we  propose  the  following  Joint 
Dynamic  Sparse  Representation  (JDSR)  model: 

K 

X  =  argmin]T||yfc-Afcxfc||2 

k= 1  D' 

s.t.  ||X||G<S, 

where  K  is  the  total  number  of  observations  and  S  is  the 
sparsity  level.  The  use  of  joint  dynamic  sparsity  regulariza¬ 
tion  term  ||X||g  has  the  following  advantages: 

•  £2 -norm  is  applied  over  each  dynamic  active  set,  thus 
allowing  to  combine  the  cues  from  all  the  observations 
during  joint  sparse  representation;  moreover,  allowing 
each  dynamic  active  set  to  be  adaptive  within  the  same 
class  is  both  more  flexible  and  reasonable  due  to  the 
fact  that  the  multiple  observations  are  different  mea¬ 
surements  of  the  same  physical  object; 

•  ^o-norm  is  applied  across  the  dynamic  active  sets,  thus 
encouraging  the  selection  of  the  most  parsimonious 
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and  representative  dynamic  active  sets,  which  pro¬ 
motes  joint  sparsity  pattern  shared  at  class-level  while 
allows  the  within-class  sparsity  patterns  to  be  distinct 
to  facilitate  the  selection  of  the  most  representative 
atoms  for  each  observation  class-wise. 

3.2.  An  Efficient  Algorithm  for  Joint  Dynamic 
Sparse  Representation 

The  JDSR  model  (7)  is  very  challenging  to  solve  due  to 
the  co-existence  of  io-norm  and  joint  dynamic  sparse  prop¬ 
erty.  We  propose  to  solve  (7)  with  a  greedy  JDSR  algo¬ 
rithm  as  detailed  in  Algorithm  1 .  The  proposed  JDSR  al¬ 
gorithm  has  a  similar  algorithmic  structure  as  SOMP  [9] 
and  CoSOMP  [2],  which  includes  the  following  general 
steps:  (i)  select  new  candidates  based  on  the  current  residue; 
(ii)  merge  the  newly  selected  candidate  set  with  previous  se¬ 
lected  atom  set;  (iii)  estimate  the  representation  coefficients 
based  on  the  merged  atom  set;  (iv)  prune  the  merged  atom 
set  to  a  specified  sparsity  level  based  on  the  newly  estimated 
representation  coefficients;  (v)  update  the  residue.  This  pro¬ 
cedure  is  iterated  until  certain  conditions  are  satisfied  [2  ] . 
We  use  X(:,i)  to  denote  the  i-th  column  of  X  and  use 
X(:,  i)  to  denote  all  the  columns  indexed  by  i  (similar  for 
the  rows).  The  major  difference  of  JDSR  with  CoSOMP  [2] 
lies  in  the  atom  selection  criteria  used  in  steps  (i)  and  (iv)  of 
Algorithm  1,  which  is  detailed  in  the  sequel. 

At  each  iteration  of  JDSR  (step  (i)  and  (iv)),  given  a  co¬ 
efficient  matrix  Z  E  UNxK,  we  need  to  select  L  most  rep¬ 
resentative  dynamic  active  sets  from  Z,  i.e.,  constructing 
the  best  approximation  Z/,  to  Z  with  L  dynamic  active  sets 
(i.e.,  \\Zl\\g  =  L).  This  can  be  obtained  as  the  solution  to 
the  following  problem: 

ZL=arg  min  \\Z  -  ZL\\jr 

zeRNxK  (g) 

s.t  \\Zl\\g<L. 

The  solution  to  (8)  can  be  obtained  by  a  procedure  called 
the  Joint  Dynamic  Sparsity  mapping  (JDS  mapping): 

II  =  Pjds(Z,  L),  (9) 

which  gives  the  index  matrix  II  E  'RLxK  containing  the 
top-L  dynamic  active  sets  for  all  the  K  observations,  as  de¬ 
tailed  in  Algorithm  2.  In  each  iteration  of  the  JDS  mapping, 
it  will  select  a  new  dynamic  active  set,  which  is  achieved 
via  three  steps:  (i)  find  the  maximum  absolute  coefficient 
for  each  class  and  each  observation;  (ii)  combine  the  maxi¬ 
mum  absolute  coefficients  across  the  observations  for  each 
class  as  the  total  response;  (iii)  select  the  dynamic  active  set 
as  the  one  which  gives  the  maximum  total  response.  After  a 
joint  dynamic  active  set  is  determined,  we  keep  a  record  of 
the  selected  indices  as  one  row  of  II  and  set  the  associated 
coefficients  in  the  coefficient  matrix  to  be  zero  to  ensure 


Algorithm  1:  Joint  Dynamic  Sparse  Representation 
(JDSR)  based  Classification  (JDSRC). 

Input:  observation  set  {yk}^=v  dictionary  set 
{Ak}£=1,  sparsity  level  S,  observation 
number  K 

Output:  class  label  c 

while  stopping  criteria  false  do 

E(:,  k)  =  A kTrk,  Vk  =  1,  2,  •  •  •  ,  K\ 

%  (i)  atom  selection  via  joint  dynamic  sparse  mapping 

Inew  PjdsIE,  2 S)  ; 

I  [IT,  ljj~ew] T  %  (h)  index  matrix  updating; 

%  (iii)  representation  coefficients  updating 
for  h  =  1,  2,  •  •  •  ,  K  do 

i  I(: ,  k); 

_  C(i,  k)  <-  (A*(:,  i)TA*(:,  i))-*A*(:,  i)Ty*; 

%  (iv)  atom  pruning  via  joint  dynamic  sparse  mapping 
I^Pjds(C,S); 

X  <-  0; 

for  k  =  1,  2,  •  •  •  ,  K  do 

i  I(:,  k),  X(i,  k)  C(i,  k); 

_  rk  =  AfeX(:,  k)  —  yk  %  (v)  residue  updating; 

for  k  =  1,  2,  •  •  •  ,  K  do 
i  <—  I(:,  k); 

_  X(i,  k)  <-  (Afc(:,i)TAfc(:,i))  1Afc(:,i)Tyfc; 
yk  =  A kSc(X(:,  k))  %  reconstruction; 

ec  =  Ylk=i  wk  IIYc  —  fc  Hi  %  total  reconstruction  error; 
c  =  arg  minc  ec  %  class  label  estimation. 


none  of  the  coefficients  will  be  selected  again.  This  proce¬ 
dure  is  iterated  until  the  specified  number  of  dynamic  active 
sets  are  determined.  After  that,  Z l  can  be  obtained  by  keep¬ 
ing  the  entries  of  Z  selected  by  II  and  setting  the  remaining 
entries  to  be  zero.  As  mentioned  above,  Algorithm  2  is  used 
as  a  sub-routine  for  dynamic  active  set  selection  in  each  it¬ 
eration  of  Algorithm  1  and  this  iteration  process  is  repeated 
on  the  residue  until  certain  conditions  are  satisfied  [2,  9]. 

3.3.  Classification  Rule 

After  recovering  the  sparse  representations  matrix  X  = 
[x1,^:2,  •  •  •  ,xK]  for  all  the  observations  {yk}^=1  of  the 
same  physical  object  via  JDSR,  we  make  a  decision  on  the 
class  label  jointly  for  all  the  observations  based  on  X,  which 
is  achieved  via  the  total  reconstruction  error  criteria  as: 

K 

c  =  argminV^||yfe-Afe(5c(xfc)||2,  (10) 

C  Z ' 

k= 1 

where  {wk}£=1  are  the  confidence  weights  for  the  observa¬ 
tions.  Using  total  reconstruction  error  for  classification,  we 
again  combine  the  cues  from  all  the  observations.  The  over¬ 
all  procedures  of  JDSRC  are  summarized  in  Algorithm  1 . 
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Algorithm  2:  Joint  Dynamic  Sparsity  Mapping 
L ) _ 

Input:  coefficient  matrix  Z,  desired  number  of 

dynamic  active  sets  L,  label  vector  L  for  atoms 
in  the  dictionary,  number  of  classes  C,  number 
of  observations  K 

Output:  index  matrix  I l  for  the  top-L  dynamic  active 
sets 

Initialize:  1l  0  %  initialize  the  index  matrix  as  empty; 

for  l  =  1,  2,  *  •  •  ,  L  do 
for  c  =  1,  2,  •  •  •  ,  C  do 

c  find(L,  c )  %  get  the  index  vector  for  the  c-th 

class; 

for  k  =  1,  2,  •  •  •  ,  if  do 

%  (i)  find  the  maximum  absolute  value  v  and 
its  index  t  for  the  c-th  class,  /c-th  observation 
[u,  t]  max(|Z(c,  fc)|)  ; 

_  V(c,  Jfe)  «- v,  !(c,Jfe)<-c(*); 

%  (ii)  combine  the  max-coefficients  for  each  class 

_  s(c)  -  vEW; 

[v.  t\  =  max(s)  %  (iii)  find  the  best  cluster  of  atoms 
belonging  to  the  same  class  across  all  the  classes; 

_IL(l,  :)  =  !(£,:),  Z(I(t, :))  <—  0T. 


4.  Experiment  Results 

In  this  section,  we  evaluate  the  performance  of  the  pro¬ 
posed  JDSRC  method  on  several  visual  classification  appli¬ 
cations.  Specifically,  we  carry  out  experiments  on  multi¬ 
region  based  face  recognition,  multi-instance  based  face 
recognition  and  multi- view  visual  recognition.  To  verify  the 
effectiveness  of  the  proposed  method,  we  compare  the  pro¬ 
posed  method  with  several  state-of-the-art  methods,  includ¬ 
ing:  SRC  [1  ],  MTJSRC  [12],  Mutual  Subspace  Method 
(MSM)  [4]  and  Affine  Hull  (AFH)  method  for  set  based 
classification  [1].  The  weight  for  each  observation  can 
be  learned  via  a  learning  procedure.  For  the  applications 
demonstrated  in  the  sequel,  all  the  observations  can  be  re¬ 
garded  as  equally  important  for  classification,  thus  all  the 
weights  are  set  to  be  equal  without  loss  of  generality. 

4.1.  Multi-Region  Face  Recognition 

Local  region/patch  based  face  recognition  methods  have 
been  proven  to  be  effective  in  literature.  In  this  subsection, 
we  treat  each  region  from  a  face  image  as  a  single  observa¬ 
tion.  Eight  regions  ( K  =  8)  are  manually  selected  in  this 
experiment  as  illustrated  in  Figure  3  (a):  (left,  right)  brows, 
eyes,  cheeks,  nose  and  mouth,  thus  inducing  a  heterogenous 
recognition  task.  Since  different  observations  have  different 
properties  and  they  can  not  be  matched  with  each  other,  ded¬ 
icated  observation-dictionaries  are  required  for  each  region. 


The  /c-th  observation-dictionary  Ak  is  constructed  from  the 
corresponding  /c-th  region  of  all  the  training  images. 

4.1.1  Holistic  SRC,  Separate-region  SRC  and  Multi¬ 
region  Joint  Dynamic  Sparse  Representation 

In  this  illustrative  experiment,  we  compare  the  recovered 
sparse  representation  vector(s)  using:  SRC  on  the  holistic 
face,  SRC  on  each  region  separately  and  the  proposed  JD¬ 
SRC  method.  For  illustration,  we  select  5  classes  from  the 
Extended  Yale  B  dataset  [  ]  where  each  class  contains  32 
gallery  faces.  Representative  faces  for  each  class  are  shown 
in  Figure  3  (d).  The  probe  face  is  shown  in  Figure  3  (b), 
which  belongs  to  class  3.  The  probe  face  is  under  extremely 
low-illumination  condition,  thus  for  better  visualization,  an 
enhanced  version  of  the  probe  face  is  shown  in  Figure  3  (c). 
We  infer  the  label  for  the  probe  face  with  the  holistic  sparse 
representation,  separate  sparse  representation  on  each  re¬ 
gion  as  well  as  the  proposed  JDSRC  method.  The  results 
are  shown  in  Figure  4  ~  6.  For  holistic-SRC,  the  recovered 
sparse  representation  vector  as  well  as  the  reconstruction 
errors  are  shown  in  Figure  4.  As  can  be  seen,  this  method 
tends  to  predict  the  probe  face  as  from  class  1,  which  is 
incorrect.  For  separate- SRC,  as  each  region  is  treated  inde¬ 
pendently,  the  sparse  representation  vectors  for  different  re¬ 
gions  are  quite  distinct,  as  shown  in  Figure  5,  thus  although 
some  regions  prefer  the  correct  label,  overall  it  makes  an 
incorrect  decision  which  is  again  class  1.  The  proposed  JD¬ 
SRC  method  can  combine  the  cues  from  all  the  8  regions 
during  sparse  representation  by  matching  each  region  with 
the  corresponding  region  of  different  gallery  images  of  the 
same  person,  thus  providing  a  more  robust  class  label  pre¬ 
diction.  As  shown  in  Figure  6,  the  recovered  sparse  coef¬ 
ficients  are  mostly  concentrated  at  the  correct  class  (class 
3,  black)  while  the  within-class  non-zero  supports  are  dif¬ 
ferent,  indicating  each  region  matches  with  different  gallery 
images  of  the  same  person,  thus  is  more  flexible.  The  final 
reconstruction  error  achieves  a  minimum  at  class  3,  which 
is  the  correct  label  for  the  probe  face  image. 

4.1.2  Multi-Region  Face  Recognition 

In  this  subsection,  we  compare  the  recognition  performance 
of  JDSRC  method  with  SRC  [1 1]  and  MTJSRC  [12]  on  the 
Extended  Yale  B  dataset  [ )]  (192  x  168  pixels).  The  parti¬ 
tions  depicted  in  Figure  3  (a)  is  used.  We  follow  the  exper¬ 
imental  setups  in  [11]  for  a  fair  comparison.  Specifically, 
all  the  2414  frontal  views  of  38  individuals  are  used  and  are 
resized  to  24  x  21.  Half  of  the  images  randomly  sampled 
from  the  whole  database  are  used  for  training  and  the  rest 
for  testing.  We  set  sparisty  level  as  S'  =  25.  Recognition 
rates  for  different  algorithms  under  this  setting  are  summa¬ 
rized  in  Table  1.  As  can  be  seen  from  Table  1,  the  pro¬ 
posed  JDSRC  method  clearly  outperforms  Nearest  Neigh- 
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(a)  Partition  (b)  Probe  (c)  Enhanced 
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Ml 

(d)  Representative  gallery  faces 

Figure  3.  Face  images  from  Extended  Yale  B  dataset,  (a)  the 
8  selected  regions  used  in  our  experiments:  (left,  right)  brows, 
eyes,  cheeks,  nose  and  mouth.  Some  face  images  used  in  Sub¬ 
section  4.1.1:  (b)  original  probe  face,  (c)  enhanced  probe  face  for 
visualization,  (d)  representative  faces  from  the  gallery  set. 


bor  (NN),  linear  SVM  (SVM)  as  well  as  holistic  SRC  [1 1] 
and  MTJSRC  [12]  on  multiple  regions.  The  performances 
of  different  algorithms  under  different  number  of  training 
samples  are  depicted  in  Figure  7  (a),  which  demonstrates 
that  the  proposed  JDSRC  method  outperforms  holistic  SRC 
constantly.  We  also  examine  the  performance  of  each  algo¬ 
rithm  under  different  image  sizes  with  down- sampling  fac¬ 
tor  of  r  G  {24, 16,  8,  6, 4}.  The  performances  of  different 
algorithms  are  shown  in  Figure  7  (b).  As  can  be  seen  from 
Figure  7  (b),  by  decreasing  the  down-sampling  factor  (i.e. 
increasing  the  dimensionality  of  features),  the  recognition 
rates  increase  for  ah  the  algorithms.  The  behaviors  of  dif¬ 
ferent  algorithms  are,  however,  different.  The  best  accuracy 
for  NN  under  the  highest  feature  dimension  is  still  lower 
than  that  of  ah  the  other  algorithms  under  the  lowest  fea¬ 
ture  dimension.  SVM  achieves  a  relatively  low  accuracy 
at  the  lowest  feature  dimension,  and  improves  the  perfor¬ 
mance  quickly  as  the  dimension  increases.  SRC  method 
can  achieve  a  relatively  higher  accuracy  at  the  lowest  fea¬ 
ture  dimension,  but  its  performance  improves  slowly  as  the 
dimensionality  increases.  The  proposed  JDSRC  method,  on 
the  other  hand,  also  achieves  a  high  recognition  accuracy 
at  the  lowest  feature  dimension,  which  is  approximately  the 
same  as  SRC.  Moreover,  as  the  feature  dimension  increases, 
JDSRC  increases  its  performance  quickly  and  achieves  a 
recognition  accuracy  of  over  99%  when  the  feature  dimen¬ 
sion  is  larger  than  504,  which  clearly  outperforms  SRC. 

4.2.  Multi-Instance  Face  Recognition 

In  this  experiment,  we  consider  the  scenario  of  hav¬ 
ing  multiple  instances  of  a  subject  for  classification,  as  in 
the  case  of  multiple  frames  generated  from  video  cameras 
which  is  a  typical  scenario  in  surveillance.  In  such  an 
unconstrained  environment,  the  captured  face  images  may 
have  large  intra-class  pose  variations.  UMIST  face  database 


Figure  4.  Holistic  face  sparse  representation,  (a)  sparse  represen¬ 
tation  coefficient  plot  (b)  reconstruction  error  bar  plot 


g  -0.05 

I  -0  -1 


Trainning  Sample  Index 


Tralnnlng  Sample  Index 


gj  0.15 

1  0.1 
3 


50  100  15 

Trainning  Sample  Index 


lepresentation  Coefficients 

I 

4  ^ 

“  ( 

3  50  100  150 

Trainning  Sample  Index 

lepresentation  Coefficient: 

1 1 

( 

3  50  100  150 

Trainning  Sample  Index 

Representation  Coefficient 

o  p 

l 

rx 

100  150 

Trainning  Sample  Index 


0  50  100  150 

Trainning  Sample  Index 


0  50  100  150 

Trainning  Sample  Index 


Figure  5.  Separate  regions  based  face  sparse  representation:  8 
sparse  coefficients  plots  and  reconstruction  error  bar  plot. 


|  0.3 

I  0.2 


Trainning  Sample  Index 


Trainning  Sample  Index 


jj  0.6 

1  0.4 


8-0.2 

I 


Trainning  Sample  Index 


Trainning  Sample  Index 


a  - t - - - 

£  0.15 

§  01  f.  If 

I  0.05 

|  ||  || 

“  0  50  100  150 

Trainning  Sample  Index 


£ 

I  03 


& 


0  50  100  150 

Trainning  Sample  Index 


Class  Index 


Figure  6.  Joint  dynamic  sparse  representation:  8  sparse  coeffi¬ 
cients  plots  and  reconstruction  error  bar  plot. 


Table  1 .  Multi-region  face  recognition  accuracy  (%)  on  the  Extend 
Yale  B  with  feature  dimension  d  —  504. _ 


Algorithm 

Recognition  Accuracy 

NN 

59.85 

SVM 

93.59 

SRC  [11] 

97.10 

MTJSRC  [12] 

98.05 

JDSRC 

99.34 
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(a)  (b) 

Figure  7.  Recognition  accuracy  plots  on  Extended  Yale  B.  (a) 
recognition  accuracy  under  different  number  of  training  samples; 
(b)  recognition  accuracy  under  different  feature  dimensions. 


Figure  8.  Sample  images  from  UMIST  database  for  a  single  sub¬ 
ject  with  varying  poses. 


Table  2.  Multi-instance  face  recognition  accuracy  (%)  on  UMIST. 


Algorithm 

2  Views 

3  Views 

4  Views 

Avg. 

MSM  [4] 

93.5 

95.0 

96.5 

95.0 

AFH  [1] 

93.0 

95.5 

97.0 

95.2 

MTJSRC  [1  ] 

94.5 

95.5 

98.0 

96.0 

JDSRC 

95.5 

97.5 

98.0 

97.0 

is  used  in  this  experiment,  consisting  of  564  images  of 
20  individuals  (mixed  race/gender)  [7].  Each  individual  is 
shown  in  a  range  of  poses  from  profile  to  frontal  views,  as 
shown  in  Figure  8.  We  randomly  select  10  images  for  each 
individual  to  construct  the  observation-dictionary,  which  is 
shared  by  all  the  observations.  For  testing,  we  regard  each 
image  as  a  single  observation  and  carry  out  experiments  un¬ 
der  different  number  of  observations  ( K  =  {2,3,4})  se¬ 
lected  randomly  from  the  rest  of  the  database  for  each  in¬ 
dividual.  We  set  the  sparsity  level  as  S  =  5.  Experiment 
results  are  summarized  in  Table  2.  As  the  multiple  obser¬ 
vations  are  not  likely  to  have  exactly  the  same  pose,  they 
are  more  likely  to  match  with  different  set  of  training  faces 
of  the  same  subject  in  the  gallery,  which  can  not  be  handled 
well  by  the  MTJSRC  method,  as  also  revealed  by  the  results 
in  Table  2.  As  can  be  seen,  the  proposed  JDSRC  method 
performs  better  than  the  other  methods  in  this  experiment. 

4.3.  Multi- View  Visual  Recognition 

We  apply  JDSRC  to  visual  recognition  from  multiple 
view  images.  First,  we  use  AFOI  dataset  for  experiment, 
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Figure  9.  ALOI  database.  Left:  sample  images  from  ALOI 
database.  Right:  72  different  viewpoints  for  a  specific  object. 


Table  3.  Multi- view  object  classification  accuracy  (%)  on  ALOI. 


Algorithm 

2  Views 

4  Views 

6  Views 

Avg. 

MSM  [4] 

97.1 

97.1 

100.0 

98.1 

AFH  [1] 

94.3 

71.4 

57.1 

74.3 

MTJSRC  [12] 

90.0 

94.3 

98.6 

94.3 

JDSRC 

97.1 

98.6 

100.0 

98.6 

which  is  a  image  collection  of  1000  small  objects  [  5], 
with  systematically  varied  viewing  angle,  illumination  an¬ 
gle,  and  illumination  color  for  each  object.  Sample  images 
and  illustration  of  the  72  viewpoints  for  each  object  are  de¬ 
picted  in  Figure  9.  In  this  experiment,  we  select  a  sub¬ 
set  of  70  classes  for  computational  consideration  in  algo¬ 
rithm  evaluation.  Images  from  6  viewpoints  corresponding 
to  view  angles  0train  =  {0°,  60°,  120°,  180°,  240°,  300°} 
are  used  for  training.  We  test  the  performance  of  different 
algorithms  by  randomly  selecting  different  number  of  views 
K  =  {2, 4,  6}  from  the  remaining  viewpoints  for  each  ob¬ 
ject.  Therefore,  training  and  testing  images  are  recorded 
from  different  viewpoints.  We  set  sparsity  level  as  S  =  5. 
The  results  are  summarized  in  Table  3,  which  further  verify 
the  effectiveness  of  the  proposed  JDSRC  method  compared 
with  the  other  methods  and  demonstrate  the  applicability  of 
the  proposed  method  on  general  visual  classification  tasks. 

We  further  apply  our  JDSRC  method  to  multi- view  face 
recognition  using  CMU  Multi-PIE  database  [  >],  which  con¬ 
tains  a  large  number  of  face  images  under  different  illumi¬ 
nations,  viewpoints  and  expressions,  up  to  4  sessions  over 
the  span  of  several  months.  Subjects  were  imaged  under  13 
cameras  at  head  height,  spaced  at  15°  intervals  and  20  illu¬ 
mination  conditions.  In  our  experiment,  the  face  regions  for 
all  poses  are  extracted  manually  and  are  resized  to  45  x  35. 
We  choose  the  first  50  classes  which  are  present  in  all  the  4 
sessions  for  experiment.  Due  to  the  symmetric  property  of 
human  faces,  we  consider  only  7  different  poses  with  view 
angles©  =  {0°,  15°,  30°,  45°,  60°,  75°,  90°}.  4  different 
view  angles  ©train  =  {0°,  30°,  60°,  90°}  from  Session  1 
are  used  for  training  while  all  the  7  different  view  angles  in 
©  from  the  Session  2  ^  4  are  used  for  testing.  This  is  a 
more  realistic  setting  in  the  sense  that  the  data  sets  used  for 
training  and  testing  are  collected  separately  and  even  not  all 
the  viewpoints  in  the  testing  sets  are  available  for  training. 
Images  with  expressions  are  not  used  in  our  experimental 
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(a)  (b) 

Figure  10.  Recognition  rate  under  different  (a)  number  of  views 
(d  =  128)  and  (b)  feature  dimensions  ( K  =  4). 

evaluation.  We  set  sparsity  level  as  S  =  5  and  use  random 
projection  for  dimensionality  reduction  [11]. 

To  generate  a  test  sample  with  K  views,  we  first  ran¬ 
domly  select  a  subject  c  G  {1,  2,  •  •  •  ,  50}  from  the  test 
set  and  then  randomly  select  K  G  {1,2, •••,7}  different 
views  imaged  at  the  same  time  instance  for  subject  c.  1000 
test  samples  are  generated  with  this  scheme  for  testing.  For 
SRC,  sparse  representation  procedure  is  performed  for  each 
view  separately  and  then  a  single  decision  is  made  based  on 
the  recovered  coefficient  vectors  using  (10).  The  MTJSRC 
method  [12]  is  not  compared  in  this  experiment,  as  the  same 
sparsity  pattern  assumption  it  makes  is  improper  for  this 
task,  thus  limiting  its  performance  (as  can  also  be  observed 
in  Table  3).  The  recognition  results  on  Session  2  are  shown 
in  Figure  10  (a).  It  is  demonstrated  that  the  multi- view 
based  methods  (K  >  1)  outperform  their  single- view  coun¬ 
terparts  ( K  =  1)  by  a  large  margin,  indicating  the  advan¬ 
tage  of  using  multiple  views  in  face  recognition.  Further¬ 
more,  it  is  noted  that  the  performance  of  all  the  algorithms 
improves  as  the  number  of  views  is  increased  and  the  pro¬ 
posed  method  outperforms  all  the  other  methods  under  all 
different  number  of  views.  We  also  examine  the  effects  of 
data  (feature)  dimensionality  d  on  recognition  rate.  The  test 
samples  are  generated  using  0  with  K  =  4.  We  vary  the 
data  dimensionality  in  the  range  of  <i  G  {32,  64, 128,  256} 
and  show  in  Figure  10  (b)  the  plots  of  the  performances  for 
all  the  algorithms  on  Session  2  data  set.  It  is  shown  that  the 
proposed  JDSRC  method  performs  the  best  under  all  the  ex¬ 
amined  dimensionality  of  features.  Finally,  we  evaluate  the 
performance  of  all  the  algorithms  on  different  sessions  from 
Multi-PIE.  The  recognition  results  on  Session  2^4  data  set 
with  d  =  128  are  summarized  in  Table  4.  It  is  demonstrated 
that  the  proposed  JDSRC  method  outperforms  all  the  other 
algorithms  on  different  test  sessions. 

5.  Conclusion 

A  novel  joint  dynamic  sparse  representation  based  visual 
recognition  method  is  presented  in  this  paper.  This  method 
inherits  the  robustness  of  the  sparse  representation  based 


Table  4.  Multi-view  face  recognition  rate  (%)  on  different  test  ses¬ 
sions of  CMU  Multi-PIE  database  (C  =  50  ,d  =  128,  K  =  4). 


Algorithm 

Session  2 

Session  3 

Session  4 

MSM  [A] 

87.4 

81.0 

76.9 

AFH  [1] 

87.8 

82.5 

78.3 

SRC  [11] 

90.4 

88.5 

85.6 

JDSRC 

92.6 

91.6 

86.7 

classification  method  while  also  has  the  advantage  of  ex¬ 
ploiting  the  correlations  among  the  multiple  observations. 
Moreover,  the  novel  joint  dynamic  sparsity  model  allows 
more  flexible  atom  selection  for  joint  sparse  representation, 
which  facilitates  recognition.  Experimental  results  of  the 
proposed  method  compared  with  state-of-the-art  methods 
on  various  visual  recognition  tasks  verified  the  effectiveness 
of  the  proposed  method.  For  future  work,  we  would  like  to 
address  theoretical  aspects  of  the  proposed  method.  Also, 
we  would  like  to  further  explore  other  applications  of  the 
proposed  method,  such  as  multi-modal  visual  classification. 
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